The following is an analysis of a dataset composed of white wine samples. The goal is to examine each variable on its own, as well as in relation to other, and see what interesting relationships exist.
## [1] 4898 13
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
Our dataset consists of 13 variables with nearly 4900 observations. One of the variables is only a counter, so there are twelve variables that provide some kind of substantive measurement.
Quality
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
The median quality is 6 with a mean just below. It appears to have a roughly normal distribution with a max of 9 and a min of 3. No wines in the dataset have a quality of 0, 1, 2, or 10. The portion of wines that don’t rate as a 5, 6, or 7 appears to be very small.
Fixed Acidity
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
Fixed acidity appears to have a roughly normal distribution (with a slightly positive skew) with a mean of 6.8 and a median of 6.855.
Volatile Acidity
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
Volatile Acidity has a mean of .2782 and a median of .2600. It appears to have a quite positive skew. Using a log_10 x scale distributes the data more normally.
Citric Acid
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
Citric Acid has a median of .32 and a mean of .3342. The distribution appears to be fairly normal with a large amount of the data clustered around the mean and median.
Residual Sugar
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
Residual sugar has a median of 5.2 and a mean of 6.391. The distribution is very positively skewed, and a log_10 scale reveals what might be a bimodal distribution.
Chlorides
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
Chlorides have a Median of .043 and a Mean of .04577. The distribution is positively skewed, and appears more normally distributed when viewed on a log_10 scale.
Free Sulfure Dioxide
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
Free Sulfur Dioxide has a median of 34.00 and 35.31. The Distribution appears fairly normal.
Total Sulfur Dioxide
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
Total Sulfur Dioxide has Median of 134 and a Mean of 138.4. The distribution is positively skewed, but appears more normal when viewed on a square root scale.
Density
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
Density has a median of .9937 and a mean of .9940. The skew is slightly positive, and appears to spike in certain places.
pH
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
pH has a median of 3.180 and a mean of 3.188. The distribution looks normal.
Sulphates
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
Sulphates have a median of .47 and a mean of .4898. The distribution is positively skewed, but appears more normal along a log_10 scale.
Alcohol
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
Alcohol has a median of 10.40 and a mean of 10.51. The skew is positive and remains so even when viewed on a logarithmic scale.
What is the structure of the dataset?
There are 4898 wines in the dataset, each with twelve features. The features are as follows: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol & quality.
For the variable quality, ten is the highest score, and zero is the lowest.
The remaining variables are measured as follows:
fixed acidity - tartaric acid in g/dm^3
volatile acidity - acetic acid in g/dm^3
citric acid - g/dm^3
residual sugar - g/dm^3
chlorides - sodium chloride in g/dm^3
free sulfur dioxide - mg/dm^3
total sulfur dioxide - mg/dm^3
density - g/cm^3
pH - rating on the pH scale
sulphates - potassium sulphate in g/dm^3
alcohol - percent by volume
What are the main features of interest?
The primary feature of interest is quality. I would like to find what variables are the primary influencers for quality. Is suspect alcohol may have some correlation with quality, and believe that some kind of acidity may also correlate.
I am also curious about residual sugar, as it has what appears to be a bimodal distribution, and would like to know what that is related to.
What features do you think will help support your analysis?
As mentioned above, I believe alcohol, residual sugar, and variables related to acidity may help to illustrate what is guiding quality. I am curious about all variables however, and will try plotting as many as possible.
Did you create any new variables?
I did not create any new variables, however I do want to look at Bound Sulfur Dioxide, which is the Total Sulfur Dioxide - Free Sulfur Dioxide.
Unusual Distributions
A few variables looked more normally distributed when placed on a log_10 scale.
The above plot suggests that quality has a noticeable positive correlation with alcohol content. It also suggests a noticeable negative correlations with density, chlorides, & total sulfur dioxide.
Other correlations appear to be:
Density & Residiual Sugar (Positive) Density & Total Sulfur Dioxide (Positive) Residual Sugar & Total Sulfur Dioxide (Positive) Total Sulfur Dioxide & Free Sulfur Dioxide (Positive)
Alcohol & Residual Sugar (Negative) Alcohol & Total Sulfur Dioxide (Negative) Alcohol & Chlorides (Negative) Fixed Acidity & pH (Negative)
Many of these relationships seem worth investigating.
Alcohol & Quality
##
## Pearson's product-moment correlation
##
## data: white_wine$quality and white_wine$alcohol
## t = 33.858, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4126015 0.4579941
## sample estimates:
## cor
## 0.4355747
The scatterplot appears to confirm the positive correlation mentioned in the previous chart. The majority of wines appear to have a quality between 5 & 7, and the average quality appears to increase as alcohol content increases. The regression line in the scatterplot confirms this as well.
The boxplot shows that the median alcohol level appears to consistently increases as quality level increases. This is observable for all quality levels except for 3 & 4. This may be due to the very small number of wines rated 3 & 4 in the data set.
The correlation coefficient for the two variables is .436. This confirms that within the dataset there is moderate positive correlation.
Density & Total Sulfur Dioxide
##
## Pearson's product-moment correlation
##
## data: white_wine$total.sulfur.dioxide and white_wine$density
## t = 43.719, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5094349 0.5497297
## sample estimates:
## cor
## 0.5298813
The scatterplot above shows an apparent positive correlation between Density & Total Sulfur Dioxide, with density gradually increasing as the other variable does.
The correlation coefficient of .530 suggests the same.
Density & Residual Sugar
##
## Pearson's product-moment correlation
##
## data: white_wine$residual.sugar and white_wine$density
## t = 107.87, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8304732 0.8470698
## sample estimates:
## cor
## 0.8389665
The above scatterplot demonstrates the strong correlation between density & residual sugar. As residual sugar increases, the average density appears to as well, and it appears that the range of possible densities decreases as well. This may be due to what appears to be a much greater number of samples with a comparatively low residual sugar however.
The regression line suggests a positive correlation within the data as well, and the correlation coefficient of .83 suggests a strong positive correlation.
Free Sulfur Dioxide over Total Sulfur Dioxide & Quality
##
## Pearson's product-moment correlation
##
## data: (white_wine$free.sulfur.dioxide/white_wine$total.sulfur.dioxide) and white_wine$quality
## t = 14.076, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1701474 0.2239834
## sample estimates:
## cor
## 0.1972141
The above scatterplot suggests a positive correlation between percentage of free sulfur dioxide and quality, though the points themselves don’t really reveal how strong it may be. The regression line confirms that it is indeed positive.
The boxplot shows the median percentage of free sulfur dioxide increasing for each successive quality level, though Quality 4 does not conform with this. This may be due to the how lightly represented quality 3 & 4 wines are within the data set. Quality 9 also does not conform, also has very few data points.
The correlation coefficient for the two variables is .197. This indicates a positive correlation, but one that is also mild.
Free Sulfure Dioxide & Total Sulfur Dioxide
##
## Pearson's product-moment correlation
##
## data: white_wine$total.sulfur.dioxide and white_wine$free.sulfur.dioxide
## t = 54.645, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5977994 0.6326026
## sample estimates:
## cor
## 0.615501
A positive correlation within the dataset is apparent here. The scatterplot clearly shows that on average, free sulfur dioxide increases as total sulfur dioxide does. The regression line makes this even more clear. The correlation coefficiient is .616, suggesting the correlation is medium to strong.
Total Sulfur Dioxide & Residual Sugar
##
## Pearson's product-moment correlation
##
## data: white_wine$total.sulfur.dioxide and white_wine$residual.sugar
## t = 30.669, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3776791 0.4246712
## sample estimates:
## cor
## 0.4014393
The above scatterplot of is particularly interesting. The points by themselves are hard to interpolate from, since so much of the data has a comparatively low level of residual sugar. However, we can see sporadic points with higher levels, and this appears to be more apparent as total sulfur dioxide increases.
The regression line shows that the correlation is indeed positive, and with more of a slope than the points alone might suggest. The correlation coefficient of .401 suggests that this correlation is mild to medium.
Density & Alcohol
##
## Pearson's product-moment correlation
##
## data: white_wine$alcohol and white_wine$density
## t = -87.255, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7908646 -0.7689315
## sample estimates:
## cor
## -0.7801376
The above plot shows a strong negative correlation between density and alcohol. The correlation coefficient of -.780 confirms this.
Bound Sulfur & Quality
##
## Pearson's product-moment correlation
##
## data: (white_wine$total.sulfur.dioxide - white_wine$free.sulfur.dioxide) and white_wine$quality
## t = -15.62, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2443831 -0.1910269
## sample estimates:
## cor
## -0.2178678
The regression line in the scatterplot above shows a negative correlation. This is also apparent from the points themselves, albeit not as clearly, since the quality seems to be widely distributed between 5 & 7 for most levels of Bound Sulfur Dioxide.
The boxplot also shows the median and quartile levels of bound sulfur dioxide decreasing as quality increases. This becomes less clear at low quality levels, but that may be influenced by the small sample size of such qualities.
The correlation coefficient of -.218 suggests a mild negative correlation.
Volatile Acidity & Quality
##
## Pearson's product-moment correlation
##
## data: white_wine$volatile.acidity and white_wine$quality
## t = -13.891, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2215214 -0.1676307
## sample estimates:
## cor
## -0.194723
The points alone in the scatterplot do not show much of a correlation between quality and volatile acidity. the regression line suggests a negative correlation.
The boxplot does not strongly indicate a correlation between the two variables. The median and quartile volatile acidity levels appear to go in different directions from each level to the next.
The correlation coefficient is -.195. This suggests a mild negative correlation.
All Acids & Quality
##
## Pearson's product-moment correlation
##
## data: white_wine$fixed.acidity + white_wine$volatile.acidity + white_wine$citric.acid and white_wine$quality
## t = -9.273, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1587994 -0.1037525
## sample estimates:
## cor
## -0.1313772
The correlation between overall acidity and quality seems to be subtle but noticeable and mostly negative. The regression line has a mildly negative slope.
This is observable in the boxplot above as well, in which median and quartile overall acidity decreases regularly as quality level goes up, save for quality level 9, which has very low representation within the data set.
The correlation coefficient of the two variables is -.131, which describes a mild negative correlation.
What relationships did you observe?
Alcohol is the individual variable that appears to correlate most strongly with quality. This is most observable for quality ratings where we have lots of samples, and for the levels where this is not true, the alcohol levels vary widely.
Density and Total Sulfur Dioxide appear to correlate positively as well.
Density and Residual Sugar have a very strong positive correlation, and the variance of density appears to decrease as residual sugar increases. Also, a great number of wines in the samplehave a relatively low residual sugar, which may have influenced these findings.
Quality also appears to correlate positively with ratio of free to total sulfur dioxide. As with many of our findings, this is less observable where there not very many of a given quality level, in this case qualities 3 & 9 go against the trend.
Free and Total Sulfur Dioxide have a very clear positive correlation. It also looks as though the variance of free sulfur dioxide increases as total sulfur dioxide increases.
Density and alchol appear to have a negative correlation, with what appears to be a fairly consisten variance across alcohol levels.
Lastly, there appears to be a negative correlation between the level of all measured acids combined and quality.
What interesting features were observed? The relationship between free and total sulfur dioxide continues to be interesting, I am particularly interested in the increased levels of free sulfur dioxide variance as total sulfur dioxide increases. I am curious to see how quality is affected when placed on this scale.
What are the strongest relationships observed? Alcohol and quality have a strong positive correlation. The ratio of free to total sulfur dioxide also has a strong correlation.
Free vs. Total Sulfur Dioxide
There is a pretty clear trend here. The upper part of the shape created by the chart is a distinctly darker color than the lower part. This suggests that as the ratio of free sulfur dioxide to total sulfur dioxide increases, the quality increases as well. This corroborates our earlier findings between the variables in bivariate analysis.
The regression lines generally appear to be higher for higher quality as well, and with the exception of quality level 9, appear to have similar slopes. Quality levels 7, 6, 5 have regression lines that are close together. For the portion of the plot where most data is present, 8 has more free sulfur dioxide at similar levels of sulfur dioxide that 7, which in turn has more than 6 and 5. Levels 3 and 4 have much less free sulfur dioxide than the others, though this may have to do with their low representation.
Density & Alcohol
This chart once again illustrates the negative correlation between density and alcohol. The color of the chart once again affirms that quality ascends as alcohol does as well. However, quality does not have appear to have much of a relation to density when alcohol level remains constant.
Many of the regression lines for different quality levels have similar slopes, and generally darker lines (higher qualities) appear to have higher density at the same level than lighter lines. Quality levels 9 and 3 appear to be an exception than this, though this probably is related to the low level of representation of these samples.
Total Sulfur Dioxide & Residual Sugar
Not much of a trend is visible here from the scatterplot, beyond what we were already able to observe in bivariate plots. The regression lines do not appear to line up in any particular order either.
Total Sulfur Dioxide & Density
This plot indicates that, at levels of of total sulfur dioxide typical to the white wines in the data set, higher quality wines tend to have a lower density level than lower quality wines of the same total sulfur dioxide level.
Density & Residual Sugar
Once again we see the clear positive correlation between density and residual sugar. This also shows that quality seems to decrease as density increases when residual sugar remains constant.
The regression lines indicate that at levels of residual sugar common to the dataset, higher quality wines tend to have lower density than lower quality level wines. Quality 9 and 3 wines don’t seem to conform as nicely to this rule, though this may be due to low representation.
What relationships did you observe?
Quality seems to increase as Density decreases in relation to the amoung of residual sugar within a white wine in the sample.
Quality also seems to increase as free sulfur dioxide increases while total sulfur dioxide remains constant.
As noted before, Quality tends to increase as alcohol increases, but even though alcohol is related to density, quality does not seem to relate to density when alcohol is held constant.
Plot One
Description One
The quality ratings of wines in the dataset have a distribution that roughly resembles a normal distribution. The median and modal quality rating is six. Of the nearly 4900 samples in the dataset, around 2200 have that rating.
The vast majority of wines of a rating of between five and seven. Wines in this range account for over 4000 samples. Wines of rating three and nine make up for very small portion of the the dataset (below a hundred samples each). There are no wines rated one, two, or ten in the dataset.
In addition to suggesting that a large portion of white wines score somewhere in the middle of the spectrum, this plot indicates the other variables in our dataset will be more descriptive of white wines with a mid-level quality.
Plot Two
##
## Pearson's product-moment correlation
##
## data: white_wine$quality and white_wine$alcohol
## t = 33.858, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4126015 0.4579941
## sample estimates:
## cor
## 0.4355747
Description Two
The above plot and data clearly indicate a positive correlation between Alcohol Content and Quality Rating.
Wines rated at six appear to have a broad range of alcohol content levels from around nine percent to thirteen percent. Wines rated at seven have a broad range as well, but it appears to be centered more between ten percent and thirteen percent. Wines rated as five are largely between eight point five and eleven percent alcohol content.
Wines of four and eight are not widely represented, but the majority of wines ranked as rating level eight appear to be between eleven and thirteen point five percent alcohol content. Wines of rating level four appear to mainly be distributed between nine and eleven percent alcohol.
The regression line in the scatterplot shows a clear positive slope, that allign with the distribution of the points in the plot. The correlation coefficient of .436 indicates that Quality Rating and Alcohol Content have a positive, moderate correlation. As Quality Rating is purely an output variable, we can take this to mean that Alcohol Content does have a moderate degree of influence on Quality Rating.
Plot Three
##
## Pearson's product-moment correlation
##
## data: white_wine$total.sulfur.dioxide and white_wine$free.sulfur.dioxide
## t = 54.645, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5977994 0.6326026
## sample estimates:
## cor
## 0.615501
Description Three
The above plot shows the relationship between Total Sulfur Dioxide and Free Sulfur Dioxide for wines between quality level 4 & 8. Levels 3 & 9 were omitted due to very low representation.
The correlation coefficient of .616 indicates that overall, Free Sulfur Dioxide and Total Sulfur Dioxide have a significant positive correlation.
Within the plot, darker points tend to be near the higher portion, which indicates that, generally, at a given level of total sulfur dioxide, wines with higher level of free sulfur dioxide will tend to have a higher quality. Put differently, a higher percentage of free sulfur dioxide appears to correlate with a higher quality.
This is observable from the regression lines as well. For the bulk of the portion of the plot where the most point are present, Regression lines for higher qualities tend to indicate higher levels of free sulfur dioxide for a given level of total sulfur dioxide.
This data contains nearly 5,000 wine samples each with a dozen variable. I began my analysis by looking at the distribution of each variable on its own, and then began looking at them in conjunction with each other to see relationships and answer interesting questions.
As I was most interested in the quality of the wine, two notable relationships that presented themselves were the correlation between quality and alcohol (this correlation was both clear and positive), as well as the correlation betwen the ratio of free to total sulfur dioxide and quality. The latter was especially interesting, since on their own, neither free nor total sulfur dioxide appeared to correlate strongly with quality.
Indeed, as someone with only a passing interest in wine, one surprising element of the data was how few of the variables appeared to have a strong correlation on their own with quality. Many of the bivariate plots earlier in the analysis show this.
One point of difficulty within the dataset was the distribution of quality levels. As my primary initial question had to do with what influenced quality, I felt limited at first by the fact that three quality levels (5,6, & 7 made up the substantial bulk of the data). That three levels (0,1, & 10) were not represented at all, added to this as well.
Because of this, for issues focusing on quality, I decided to focus largely on wines between quality 4 & 8. While this did not allow me to glean information about higher and lower quality wines, it did allow me to get a sense of the story for wines that made up the bulk of the sample, and likely the types of wines similar to those consumer would be used to finding, and I think it was successful in that regard.
I think I was also able to find success in using different types of plots and methods to learn about the dataset in different ways. Initially, many of my scatterplots did not appear to be descriptive to me. Regression lines made the shape of data clearer, and employing correlation coefficients helped to quantify what I thought I was seeing in many cases.
I feel confident that my analysis left me with a better understanding of what influences quality in white wine. As a follow-up, I would like to explore the red wine set Udacity has uploaded, and see if the factors influencing quality are similar. Eventually, looking at other types of alcoholic drinks to see if alcohol content influences perceived quality in the same way would be interesting as well.